Producing Accurate Interpretable Clusters from High-Dimensional Data

نویسندگان

  • Derek Greene
  • Padraig Cunningham
چکیده

The primary goal of cluster analysis is to produce clusters that accurately reflect the natural groupings in the data. A second objective that is important for high-dimensional data is to identify features that are descriptive of the clusters. In addition to these requirements, we often wish to allow objects to be associated with more than one cluster. In this paper we present a technique, based on the spectral co-clustering model, that is effective in meeting these objectives. Our evaluation on a range of text clustering problems shows that the proposed method yields accuracy superior to that afforded by existing techniques, while producing cluster descriptions that are amenable to human interpretation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient high dimension data clustering using constraint-partitioning k-means algorithm

With the ever-increasing size of data, clustering of large dimensional databases poses a demanding task that should satisfy both the requirements of the computation efficiency and result quality. In order to achieve both tasks, clustering of feature space rather than the original data space has received importance among the data mining researchers. Accordingly, we performed data clustering of h...

متن کامل

Prediction-Constrained Topic Models for Antidepressant Recommendation

Supervisory signals can help topic models discover low-dimensional data representations that are more interpretable for clinical tasks. We propose a framework for training supervised latent Dirichlet allocation that balances two goals: faithful generative explanations of high-dimensional data and accurate prediction of associated class labels. Existing approaches fail to balance these goals by ...

متن کامل

Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model

We aim to produce predictive models that are not only accurate, but are also interpretable to human experts. Our models are decision lists, which consist of a series of if . . . then. . . statements (e.g., if high blood pressure, then stroke) that discretize a high-dimensional, multivariate feature space into a series of simple, readily interpretable decision statements. We introduce a generati...

متن کامل

A Least Squares Approach to Estimating the Average Reservoir Pressure

Least squares method (LSM) is an accurate and rapid method for solving some analytical and numerical problems. This method can be used to estimate the average reservoir pressure in well test analysis. In fact, it may be employed to estimate parameters such as permeability (k) and pore volume (Vp). Regarding this point, buildup, drawdown, late transient test data, modified Muskat method, interfe...

متن کامل

Warped Mixtures for Nonparametric Cluster Shapes

A mixture of Gaussians fit to a single curved or heavy-tailed cluster will report that the data contains many clusters. To produce more appropriate clusterings, we introduce a model which warps a latent mixture of Gaussians to produce nonparametric cluster shapes. The possibly low-dimensional latent mixture model allows us to summarize the properties of the high-dimensional clusters (or density...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005